Testing primers against PR2

1 Goal

  • Testing primers against the PR2 database (latest version 4.11.1)

2 Notes about script

  • Chunks compute_matches and store_matches have only to be run in 2 cases
    • new version of pr2
    • new set of primers
    • In the other case, they can be set eval=FALSE

2.1 To do

  • Check why there are some negative amplicon (probably some matches ahead - Put a condition that amplicon should be >)

4 Read primer file

4.3 Build the primer set table 1

region specificity primer_set_id primer_set_name fwd_name rev_name length_yeast
V4 1 Hadziavdic1 F-566 R-1200 635
V4 2 Hadziavdic2 A-528F R-952 379
V4 3 Hugerth1 574*f 1132r 576
V4 4 Hugerth2 563f 1132r 587
V4 5 Hugerth3 616f 1132r 534
V4 6 Hugerth4 616*f 1132r 534
V4 7 Bass V4_1f TAReukREV3 417
V4 8 StoeckV4 TAReuk454FWD1 TAReukREV3 417
V4 36 StoeckV4 TAReuk454FWD1 V4RB 417
V4 12 Geisen 3NDF 1132r 599
V4 13 Brate1 3NDF V4_euk_R1 465
V4 14 Brate2 3NDF V4_euk_R2 458
V4 15 Moreno EUKAF EUKAR 410
V4 16 PireddaV4 TAReuk454FWD1 V4 18S Next.Rev 417
V4 17 Comeau E572F E1009R 438
V4 18 Parfrey 515f 1119r 598
V4 22 Kim Nex_18S_0587_F Nex_18S_0964_R 419
V4 23 Venter 590F 1300R 720
V4 24 Simon EK-565F-NGS UNonMetR 527
V4 25 Mangot NSF563 NSR951 380
V4 26 Pawlowski Nex_18S_0587_F S12.2R 445
V4 34 Lambert 515F Univ short NSR951 391
V4 40 Zhan Uni18SF Uni18SR 461
V4-specific diatoms 21 Zimmerman D512for D978rev 437
V4-specific cercozoa 36 Harder Cerc479F Cerc750R 283
V4-specific ciliates 19 Vannini Claudia Vannini (F) Claudia Vannini (R) 439
V4-specific diplonemids 37 Cannon DimA DimB 295
V4-specific chlorophyta 38 Moro ChloroF ChloroR 494
V4-specific haptophyta 39 Egge A-528F PRYM01+7 396
V4-specific non-metazoa 35 Bower UNonMetF UNonMetR 578
V9 27 StoeckV9 1391F EukB 169
V9 28 Amaral1 1380F 1510R 176
V9 29 Amaral2 1389F 1510R 167
V9 31 PireddaV9 1388F 1510R 167
Universal 32 Wilkins 926wF 1392-R 513
Universal 33 Needham 515f Univ 926R 589

5 Computing the matches

This part is done by an R script PR2 Primers pr2_match.R that is executed in batch mode.

5.1 Programing Notes

  • Use Biostrings

Accessor methods : In the code snippets below, x is an MIndex object.

  • length(x): The number of patterns that matches are stored for.
  • names(x): The names of the patterns that matches are stored for.
  • startIndex(x): A list containing the starting positions of the matches for each pattern.
  • endIndex(x): A list containing the ending positions of the matches for each pattern.
  • elementNROWS(x): An integer vector containing the number of matches for each pattern.
  • x[[i]]: Extract the matches for the i-th pattern as an IRanges object.
  • unlist(x, recursive=TRUE, use.names=TRUE): Return all the matches in a single IRanges object. recursive and use.names are ignored.
  • extractAllMatches(subject, mindex): Return all the matches in a single XStringViews object.

One could also use another function which does not give the position * match_fwd <- vcountPattern(fwd, seq,max.mismatch=0, min.mismatch=0, with.indels=FALSE, fixed=FALSE, algorithm=“auto”)

8 Summarize the data in tables

9 Graphics

9.1 All Eukaryotes

Comments

  • Percent of sequences amplified
    • Logically, the % of seq amplified is always < min(% of sequences matching forwar, % of sequences matching reverse)
    • In general it is the reverse primer that causes problems.
    • Some primer sets do not amplify any sequence (11, 19, 20, 21)
    • For V9, primer set 30 reverse primer is in the ITS region which is not present in the PR2 sequences, so no amplification.
  • Amplicon sizes
    • Only 8 V4 primer sets are suitable for Illumina 2x250 (max amplicon size = 450 bp)
    • This is if we consider the mean… If we consider the variation around the mean then, only 3 suitable for 2x250
    • Five more V4 primer sets are suitable for Illumina 3x250 (max amplicon size = 550 bp)
for (one_region in regions) {
    
    pr2_match_summary_primer_set_region_long <- filter(pr2_match_summary_primer_set_long, region == 
        one_region)
    pr2_match_summary_primer_set_region <- filter(pr2_match_summary_primer_set, region == one_region)
    pr2_match_region <- filter(pr2_match_final, region == one_region)
    
    g <- ggplot(pr2_match_summary_primer_set_region_long) + geom_col(aes(x = reorder(primer_label, 
        primer_set_id), y = pct_seq, fill = fct_reorder(pct_category, pct_category_order)), 
        width = 0.7, position = "dodge") + theme_bw() + xlab("Primer set") + ylab("% of sequences amplified") + 
        scale_fill_manual(name = "% amplified", values = c(pct_amplicons = "black", pct_fwd = "blue", 
            pct_rev = "red"), labels = c("Primer fwd", "Primer rev", "Amplicons")) + ggtitle(str_c(one_region, 
        " - Percentage of sequences recovered")) + theme(axis.text.x = element_text(angle = 45, 
        hjust = 1))
    
    print(g)
    
    
    g <- ggplot(filter(pr2_match_summary_primer_set_region, !is.nan(ampli_size_mean))) + geom_point(aes(x = reorder(primer_label, 
        primer_set_id), y = ampli_size_mean), colour = "black") + geom_errorbar(aes(x = reorder(primer_label, 
        primer_set_id), ymax = ampli_size_mean + ampli_size_sd, ymin = ampli_size_mean - ampli_size_sd)) + 
        theme_bw() + xlab("Primer set") + ylab("Amplicon size (bp)") + # scale_y_continuous(breaks = (1:8)*200, limits = c(0,1500)) +
    ggtitle(str_c(one_region, " - Amplicon size - Lines correspond to limits for Illumina 2x250 and 2x300 respectively")) + 
        geom_hline(yintercept = c(450, 550), linetype = c(2, 3)) + theme(axis.text.x = element_text(angle = 45, 
        hjust = 1))
    
    print(g)
    
    g <- ggplot(pr2_match_region) + geom_boxplot(aes(x = reorder(primer_label, primer_set_id), 
        y = ampli_size), colour = "black", outlier.alpha = 0.3) + theme_bw() + xlab("Primer set") + 
        ylab("Amplicon size (bp)") + # scale_y_continuous(breaks = (1:8)*200, limits = c(0,1500)) +
    ggtitle(str_c(one_region, " - Amplicon size - Lines correspond to limits for Illumina 2x250 and 2x300 respectively")) + 
        geom_hline(yintercept = c(450, 550), linetype = c(2, 3)) + theme(axis.text.x = element_text(angle = 45, 
        hjust = 1))
    
    print(g)
}

Daniel Vaulot

12 06 2019